Overview

Dataset statistics

Number of variables9
Number of observations613
Missing cells252
Missing cells (%)4.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory43.2 KiB
Average record size in memory72.2 B

Variable types

Numeric8
Categorical1

Alerts

Outcome has constant value "0.0" Constant
Pregnancies is highly correlated with AgeHigh correlation
SkinThickness is highly correlated with BMIHigh correlation
BMI is highly correlated with SkinThicknessHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Pregnancies is highly correlated with AgeHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Pregnancies is highly correlated with AgeHigh correlation
SkinThickness is highly correlated with Insulin and 1 other fieldsHigh correlation
Insulin is highly correlated with SkinThicknessHigh correlation
BMI is highly correlated with SkinThicknessHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Pregnancies has 17 (2.8%) missing values Missing
Glucose has 8 (1.3%) missing values Missing
SkinThickness has 10 (1.6%) missing values Missing
BMI has 8 (1.3%) missing values Missing
DiabetesPedigreeFunction has 12 (2.0%) missing values Missing
Outcome has 197 (32.1%) missing values Missing
Pregnancies has 87 (14.2%) zeros Zeros

Reproduction

Analysis started2022-09-19 19:47:03.188293
Analysis finished2022-09-19 19:47:18.799755
Duration15.61 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Pregnancies
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
ZEROS

Distinct12
Distinct (%)2.0%
Missing17
Missing (%)2.8%
Infinite0
Infinite (%)0.0%
Mean3.593959732
Minimum0
Maximum11
Zeros87
Zeros (%)14.2%
Negative0
Negative (%)0.0%
Memory size4.9 KiB
2022-09-20T01:17:18.947210image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q36
95-th percentile10
Maximum11
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.029751149
Coefficient of variation (CV)0.8430119911
Kurtosis-0.5444745555
Mean3.593959732
Median Absolute Deviation (MAD)2
Skewness0.700753706
Sum2142
Variance9.179392025
MonotonicityNot monotonic
2022-09-20T01:17:19.109712image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
1107
17.5%
087
14.2%
284
13.7%
361
10.0%
455
9.0%
544
7.2%
640
 
6.5%
737
 
6.0%
825
 
4.1%
925
 
4.1%
Other values (2)31
 
5.1%
ValueCountFrequency (%)
087
14.2%
1107
17.5%
284
13.7%
361
10.0%
455
9.0%
544
7.2%
640
 
6.5%
737
 
6.0%
825
 
4.1%
925
 
4.1%
ValueCountFrequency (%)
1110
 
1.6%
1021
 
3.4%
925
 
4.1%
825
 
4.1%
737
6.0%
640
6.5%
544
7.2%
455
9.0%
361
10.0%
284
13.7%

Glucose
Real number (ℝ≥0)

MISSING

Distinct123
Distinct (%)20.3%
Missing8
Missing (%)1.3%
Infinite0
Infinite (%)0.0%
Mean115.9065664
Minimum44
Maximum191
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.9 KiB
2022-09-20T01:17:19.282642image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile79
Q197
median112
Q3131
95-th percentile168
Maximum191
Range147
Interquartile range (IQR)34

Descriptive statistics

Standard deviation26.85241429
Coefficient of variation (CV)0.2316729339
Kurtosis-0.0747268786
Mean115.9065664
Median Absolute Deviation (MAD)17
Skewness0.5417773278
Sum70123.47266
Variance721.0521534
MonotonicityNot monotonic
2022-09-20T01:17:19.472742image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9916
 
2.6%
12514
 
2.3%
10014
 
2.3%
10614
 
2.3%
11213
 
2.1%
9513
 
2.1%
10213
 
2.1%
10812
 
2.0%
11112
 
2.0%
9011
 
1.8%
Other values (113)473
77.2%
ValueCountFrequency (%)
441
 
0.2%
561
 
0.2%
571
 
0.2%
611
 
0.2%
621
 
0.2%
651
 
0.2%
671
 
0.2%
681
 
0.2%
714
0.7%
721
 
0.2%
ValueCountFrequency (%)
1911
 
0.2%
1881
 
0.2%
1842
 
0.3%
1833
0.5%
1821
 
0.2%
1811
 
0.2%
1804
0.7%
1795
0.8%
1781
 
0.2%
1761
 
0.2%

BloodPressure
Real number (ℝ≥0)

Distinct36
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean71.47567037
Minimum40
Maximum98
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.9 KiB
2022-09-20T01:17:19.651918image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile54
Q164
median70
Q378
95-th percentile90
Maximum98
Range58
Interquartile range (IQR)14

Descriptive statistics

Standard deviation10.71811548
Coefficient of variation (CV)0.1499547388
Kurtosis-0.1728030344
Mean71.47567037
Median Absolute Deviation (MAD)6
Skewness0.01294709426
Sum43814.58594
Variance114.8779994
MonotonicityNot monotonic
2022-09-20T01:17:19.835891image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=36)
ValueCountFrequency (%)
7047
 
7.7%
7444
 
7.2%
6439
 
6.4%
6839
 
6.4%
7237
 
6.0%
69.1054687534
 
5.5%
7833
 
5.4%
8032
 
5.2%
7630
 
4.9%
6029
 
4.7%
Other values (26)249
40.6%
ValueCountFrequency (%)
401
 
0.2%
444
 
0.7%
461
 
0.2%
484
 
0.7%
5010
1.6%
5210
1.6%
5410
1.6%
552
 
0.3%
5612
2.0%
5814
2.3%
ValueCountFrequency (%)
983
 
0.5%
964
 
0.7%
951
 
0.2%
946
 
1.0%
927
 
1.1%
9015
2.4%
8821
3.4%
8617
2.8%
856
 
1.0%
8414
2.3%

SkinThickness
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct40
Distinct (%)6.6%
Missing10
Missing (%)1.6%
Infinite0
Infinite (%)0.0%
Mean25.16081917
Minimum8
Maximum47
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.9 KiB
2022-09-20T01:17:20.130992image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum8
5-th percentile15
Q120.53645833
median20.53645833
Q330
95-th percentile41
Maximum47
Range39
Interquartile range (IQR)9.463541667

Descriptive statistics

Standard deviation8.013968452
Coefficient of variation (CV)0.3185098386
Kurtosis-0.1624290884
Mean25.16081917
Median Absolute Deviation (MAD)3.536458333
Skewness0.739164622
Sum15171.97396
Variance64.22369035
MonotonicityNot monotonic
2022-09-20T01:17:20.313310image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
20.53645833205
33.4%
3225
 
4.1%
3024
 
3.9%
2319
 
3.1%
2818
 
2.9%
2718
 
2.9%
1818
 
2.9%
1917
 
2.8%
4014
 
2.3%
2214
 
2.3%
Other values (30)231
37.7%
ValueCountFrequency (%)
81
 
0.2%
104
 
0.7%
116
 
1.0%
126
 
1.0%
138
1.3%
144
 
0.7%
1513
2.1%
165
 
0.8%
1710
1.6%
1818
2.9%
ValueCountFrequency (%)
472
 
0.3%
466
1.0%
452
 
0.3%
443
 
0.5%
434
 
0.7%
425
 
0.8%
4112
2.0%
4014
2.3%
3911
1.8%
385
 
0.8%

Insulin
Real number (ℝ≥0)

HIGH CORRELATION

Distinct103
Distinct (%)16.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean89.46528769
Minimum22
Maximum180
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.9 KiB
2022-09-20T01:17:20.499646image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum22
5-th percentile50
Q179.79947917
median79.79947917
Q390
95-th percentile159.4
Maximum180
Range158
Interquartile range (IQR)10.20052083

Descriptive statistics

Standard deviation29.52718635
Coefficient of variation (CV)0.330040702
Kurtosis1.647487433
Mean89.46528769
Median Absolute Deviation (MAD)0
Skewness1.294112109
Sum54842.22135
Variance871.854734
MonotonicityNot monotonic
2022-09-20T01:17:20.693523image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
79.79947917343
56.0%
10511
 
1.8%
1408
 
1.3%
1308
 
1.3%
947
 
1.1%
1007
 
1.1%
1207
 
1.1%
1156
 
1.0%
1106
 
1.0%
1356
 
1.0%
Other values (93)204
33.3%
ValueCountFrequency (%)
221
 
0.2%
232
0.3%
291
 
0.2%
321
 
0.2%
363
0.5%
372
0.3%
381
 
0.2%
402
0.3%
411
 
0.2%
421
 
0.2%
ValueCountFrequency (%)
1806
1.0%
1781
 
0.2%
1763
0.5%
1753
0.5%
1711
 
0.2%
1702
 
0.3%
1684
0.7%
1672
 
0.3%
1661
 
0.2%
1654
0.7%

BMI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct218
Distinct (%)36.0%
Missing8
Missing (%)1.3%
Infinite0
Infinite (%)0.0%
Mean31.57310046
Minimum18.2
Maximum46.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.9 KiB
2022-09-20T01:17:20.881328image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum18.2
5-th percentile22.12
Q127.1
median31.6
Q335.5
95-th percentile42.9
Maximum46.8
Range28.6
Interquartile range (IQR)8.4

Descriptive statistics

Standard deviation6.152830529
Coefficient of variation (CV)0.194875715
Kurtosis-0.3933231745
Mean31.57310046
Median Absolute Deviation (MAD)4.2
Skewness0.241389441
Sum19101.72578
Variance37.85732352
MonotonicityNot monotonic
2022-09-20T01:17:21.069108image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
31.210
 
1.6%
33.310
 
1.6%
31.610
 
1.6%
31.9925781210
 
1.6%
3210
 
1.6%
34.28
 
1.3%
30.88
 
1.3%
32.48
 
1.3%
29.78
 
1.3%
307
 
1.1%
Other values (208)516
84.2%
(Missing)8
 
1.3%
ValueCountFrequency (%)
18.23
0.5%
18.41
 
0.2%
19.11
 
0.2%
19.31
 
0.2%
19.41
 
0.2%
19.52
0.3%
19.61
 
0.2%
19.91
 
0.2%
201
 
0.2%
20.42
0.3%
ValueCountFrequency (%)
46.82
0.3%
46.71
 
0.2%
46.31
 
0.2%
46.21
 
0.2%
46.12
0.3%
45.71
 
0.2%
45.61
 
0.2%
45.51
 
0.2%
45.33
0.5%
45.21
 
0.2%

DiabetesPedigreeFunction
Real number (ℝ≥0)

MISSING

Distinct424
Distinct (%)70.5%
Missing12
Missing (%)2.0%
Infinite0
Infinite (%)0.0%
Mean0.4283527454
Minimum0.078
Maximum1.353
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.9 KiB
2022-09-20T01:17:21.266637image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.078
5-th percentile0.136
Q10.236
median0.342
Q30.58
95-th percentile0.956
Maximum1.353
Range1.275
Interquartile range (IQR)0.344

Descriptive statistics

Standard deviation0.2666077556
Coefficient of variation (CV)0.6224023506
Kurtosis0.9913229027
Mean0.4283527454
Median Absolute Deviation (MAD)0.146
Skewness1.180431806
Sum257.44
Variance0.07107969536
MonotonicityNot monotonic
2022-09-20T01:17:21.466117image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.2615
 
0.8%
0.2585
 
0.8%
0.2685
 
0.8%
0.6924
 
0.7%
0.2994
 
0.7%
0.2384
 
0.7%
0.2454
 
0.7%
0.2844
 
0.7%
0.2634
 
0.7%
0.2544
 
0.7%
Other values (414)558
91.0%
(Missing)12
 
2.0%
ValueCountFrequency (%)
0.0781
0.2%
0.0841
0.2%
0.0852
0.3%
0.0882
0.3%
0.0891
0.2%
0.0921
0.2%
0.0961
0.2%
0.11
0.2%
0.1011
0.2%
0.1021
0.2%
ValueCountFrequency (%)
1.3531
0.2%
1.3211
0.2%
1.3181
0.2%
1.2921
0.2%
1.2821
0.2%
1.2681
0.2%
1.2511
0.2%
1.2241
0.2%
1.2221
0.2%
1.2131
0.2%

Age
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct41
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31.7862969
Minimum21
Maximum61
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.9 KiB
2022-09-20T01:17:21.654246image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q124
median28
Q339
95-th percentile52
Maximum61
Range40
Interquartile range (IQR)15

Descriptive statistics

Standard deviation9.908981607
Coefficient of variation (CV)0.3117375276
Kurtosis-0.1004084547
Mean31.7862969
Median Absolute Deviation (MAD)6
Skewness0.9129027029
Sum19485
Variance98.18791649
MonotonicityNot monotonic
2022-09-20T01:17:21.851593image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%)
2262
 
10.1%
2154
 
8.8%
2541
 
6.7%
2440
 
6.5%
2831
 
5.1%
2329
 
4.7%
2727
 
4.4%
2627
 
4.4%
2921
 
3.4%
4119
 
3.1%
Other values (31)262
42.7%
ValueCountFrequency (%)
2154
8.8%
2262
10.1%
2329
4.7%
2440
6.5%
2541
6.7%
2627
4.4%
2727
4.4%
2831
5.1%
2921
 
3.4%
3017
 
2.8%
ValueCountFrequency (%)
611
 
0.2%
603
0.5%
592
 
0.3%
583
0.5%
574
0.7%
562
 
0.3%
553
0.5%
545
0.8%
533
0.5%
526
1.0%

Outcome
Categorical

CONSTANT
MISSING
REJECTED

Distinct1
Distinct (%)0.2%
Missing197
Missing (%)32.1%
Memory size4.9 KiB
0.0
416 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1248
Distinct characters2
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0416
67.9%
(Missing)197
32.1%

Length

2022-09-20T01:17:22.037768image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-20T01:17:22.249887image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0416
100.0%

Most occurring characters

ValueCountFrequency (%)
0832
66.7%
.416
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number832
66.7%
Other Punctuation416
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0832
100.0%
Other Punctuation
ValueCountFrequency (%)
.416
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1248
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0832
66.7%
.416
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1248
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0832
66.7%
.416
33.3%

Interactions

2022-09-20T01:17:16.739163image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:06.703748image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:08.206701image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:09.570940image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:10.911181image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:12.417685image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:13.877244image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:15.242390image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:16.906590image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:06.904251image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:08.372564image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:09.731465image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:11.077944image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:12.616050image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:14.050241image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:15.410064image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:17.078958image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:07.084801image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:08.536564image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:09.898393image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:11.244070image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:12.802999image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:14.226414image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:15.600482image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:17.247962image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:07.269285image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:08.709874image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:10.068552image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:11.412693image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:12.988219image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:14.401912image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:15.769547image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:17.419131image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:07.439855image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:08.879654image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:10.238118image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:11.589496image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:13.163680image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:14.567398image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:15.940967image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:17.591493image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:07.608407image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:09.052733image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:10.409398image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:11.772127image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:13.341518image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:14.735272image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:16.114603image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:17.755077image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:07.774861image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:09.227868image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:10.576502image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:11.952116image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:13.507521image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:14.894761image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:16.281610image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:17.921977image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:08.034713image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:09.400910image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:10.740808image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:12.240535image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:13.683956image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:15.065566image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T01:17:16.566478image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-09-20T01:17:22.403515image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-20T01:17:22.614296image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-20T01:17:22.827447image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-20T01:17:23.145921image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-20T01:17:18.146444image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-20T01:17:18.346587image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-09-20T01:17:18.564357image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-09-20T01:17:18.735906image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
06.0148.072.00000035.00000079.79947933.6000000.62750.0NaN
11.085.066.00000029.00000079.79947926.6000000.35131.00.0
28.0183.064.00000020.53645879.79947923.3000000.67232.0NaN
31.089.066.00000023.00000094.00000028.1000000.16721.00.0
40.0137.040.00000035.000000168.00000043.100000NaN33.0NaN
55.0116.074.00000020.53645879.79947925.6000000.20130.00.0
63.078.050.00000032.00000088.00000031.0000000.24826.0NaN
710.0115.069.10546920.53645879.79947935.3000000.13429.00.0
88.0125.096.00000020.53645879.79947931.9925780.23254.0NaN
94.0110.092.00000020.53645879.79947937.6000000.19130.00.0

Last rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
6031.0128.088.039.000000110.00000036.51.05737.0NaN
6047.0137.090.041.00000079.79947932.00.39139.00.0
6050.0123.072.020.53645879.79947936.30.25852.0NaN
6061.0106.076.020.53645879.79947937.50.19726.00.0
6079.0170.074.031.00000079.79947944.00.40343.0NaN
6089.089.062.020.53645879.79947922.50.14233.00.0
6092.0122.070.027.00000079.79947936.80.34027.00.0
6105.0121.072.023.000000112.00000026.20.24530.00.0
6111.0126.060.020.53645879.79947930.10.34947.0NaN
6121.093.070.031.00000079.79947930.40.31523.00.0